Search CORE

58 research outputs found

Prise en Compte de la Structure des Documents pour la Découverte d'Informations Inattendues

Author: Jacquenet François
Largeron Christine
Publication venue: unknown
Publication date: 01/01/2006
Field of study

National audienceDans cet article nous nous intéressons à la prise en compte de la structure des documents dans un processus de découverte d'informations inattendues au sein d'un corpus de documents textuels. Faisant suite à un premier travail visant à concevoir et implanter des mesures d'inattendu dans un système baptisé UnexpectedMiner, nous avons cherché à améliorer les performances de celui-ci en prenant en compte la structure des documents analysés. Chaque partie des documents est ainsi pondérée par des coefficients dont les valeurs sont déterminées par un algorithme d'optimisation. Ces coefficients sont alors intégrés dans les mesures d'inattendu utilisées par UnexpectedMiner pour déterminer si un document présente un caractère inattendu ou pas. Les performances de notre nouveau système sont évaluées et mettent en évidence les améliorations de performances induites par la prise en compte de la structure des documents

HAL-UJM

Un cadre théorique pour la gestion de grandes bases de motifs

Author: Jacquenet François
Jeudy Baptiste
Largeron Christine
Publication venue: Cépaduès éditions
Publication date: 01/01/2007
Field of study

National audienceLes algorithmes de fouille de données sont maintenant capables de traiter de grands volumes de données mais les utilisateurs sont souvent submergés par la quantité de motifs générés. En outre, dans certains cas, que ce soit pour des raisons de confidentialité ou de coûts, les utilisateurs peuvent ne pas avoir accès directement aux données et ne disposer que des motifs. Les utilisateurs n'ont plus alors la possibilité d'approfondir à partir des données initiales le processus de fouille de façon à extraire des motifs plus spécifiques. Pour remédier à cette situation, une solution consiste à gérer les motifs. Ainsi, dans cet article, nous présentons un cadre théorique permettant à un utilisateur de manipuler, en post-traitement, une collection de motifs préalablement extraite. Nous proposons de représenter la collection sous la forme d'un graphe qu'un utilisateur pourra ensuite exploiter à l'aide d'opérateurs algébriques pour y retrouver des motifs ou en chercher de nouveaux

HAL-UJM

Efficient Management of Non Redundant Rules in Large Pattern Bases: a Bitmap Approach

Author: Jacquenet François
Largeron Christine
Udréa Cédric
Publication venue: Unknown
Publication date: 01/01/2006
Field of study

International audienceKnowledge Discovery from Databases has more and more impact nowadays and various tools are now available to extract efficiently (in time and memory space) some knowledge from huge databases. Nevertheless, those systems generally produce some large pattern bases and then the management of these one rapidly becomes untractable. Few works have focused on pattern base management systems and researches on that domain are really new. This paper comes within that context, dealing with a particular class of patterns that is association rules. More precisely, we present the way we have efficiently implemented the search for non redundant rules thanks to a representation of rules in the form of bitmap arrays. Some experiments show that the use of this technique increases dramatically the gain in time and space, allowing us to manage large pattern bases

HAL-UJM

Correct your Text with Google

Author: Jacquemont Stéphanie
Jacquenet François
Sebban Marc
Publication venue: HAL CCSD
Publication date: 02/11/2007
Field of study

to appear in the Proceedings of the International Conference on Web Intelligence, IEEE 2007.International audienceWith the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features

HAL-UJM

Crossref

Accurate Visual Features for Automatic Tag Correction in Videos

Author: Fromont Elisa
Jacquenet François
Jeudy Baptiste
Tran Hoang-Tung
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2013
Field of study

International audienceWe present a new system for video auto tagging which aims at correcting the tags provided by users for videos uploaded on the Internet. Unlike most existing systems, in our proposal, we do not use the questionable textual information nor any supervised learning system to perform a tag propagation. We propose to compare directly the visual content of the videos described by different sets of features such as Bag-Of-visual-Words or frequent patterns built from them. We then propose an original tag correction strategy based on the frequency of the tags in the visual neighborhood of the videos. Experiments on a Youtube corpus show that our method can effectively improve the existing tags and that frequent patterns are useful to construct accurate visual features

HAL-UJM

Unsupervised Video Tag Correction System

Author: Fromont Elisa
Jacquenet François
Jeudy Baptiste
Martins Adrien
Tran Hoang-Tung
Publication venue: Hermann-Éditions
Publication date: 29/01/2013
Field of study

National audienceWe present a new system for video auto tagging which aims at cor- recting and completing the tags provided by users for videos uploaded on the Internet. Unlike most existing systems, we do not learn any tag classifiers or use the questionable textual information to compare our videos. We propose to compare directly the visual content of the videos described by different sets of features such as Bag-of-visual-Words or frequent patterns built from them. Then, we propagate tags between visually similar videos according to the frequency of these tags in a given video neighborhood. We also propose a controlled experimental set up to evaluate such a system. Experiments show that with suitable features, we are able to correct a reasonable amount of tags in Web videos

HAL-UJM

Mining probabilistic automata: a statistical view of sequential pattern mining

Author: A. S. Reber
A. V. Evfimievski
C. Higuera de la
E. M. Gold
E. M. Newton
François Jacquenet
G. I. Webb
H. Mannila
J. Ayres
J. Borges
J. Han
J. Pei
J. Shaffer
K. Pearson
L. G. Valiant
L. Sweeney
M. Garofalakis
M. J. Zaki
M. J. Zaki
M. Klemettinen
M. Spiliopoulou
Marc Sebban
P. Dupont
P. Laur
P. Laur
R. A. Fisher
R. Agrawal
R. Agrawal
R. C. Carrasco
R. J. Bayardo
R. Kosala
R. Srikant
S. Holm
Stéphanie Jacquemont
V. S. Verykios
W. Hoeffding
Y. Benjamini
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Teaching Experiments and Programming for Machine Learning

Author: Jacquenet François
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

HAL-UJM

Sequence Mining Without Sequences: a New Way for Privacy Preserving

Author: Jacquemont Stéphanie
Jacquenet François
Sebban Marc
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

International audienceDuring the last decade, sequential pattern mining has been the core of numerous researches. It is now possible to efficiently discover users' behavior in various domains such as purchases in supermarkets, Web site visits, etc. Nevertheless, classical algorithms do not respect individual's privacy, exploiting personal information (name, IP address, etc.). We provide an original solution to privacy preserving by using a probabilistic automaton instead of the original data. An application in car flow modelization is presented, showing the ability of our algorithm to discover frequent routes without any individual information. A comparison with SPAM is done showing that even if we sample from the automaton, our approach is more efficient

HAL-UJM